Model Selection

Zero-Shot Classification

# Zero-Shot Classification

Clip Vitl14 Test Time Registers

Based on the OpenCLIP-ViT-L-14 model, the Test-Time Register technology is introduced to improve the model's interpretability and the performance of downstream tasks.

Sail Clip Hendrix 10epochs

A vision-language model fine-tuned from openai/clip-vit-large-patch14, trained for 10 epochs

Git-RSCLIP is a vision-language model pretrained on the Git-10M dataset, specializing in multimodal understanding of remote sensing images.

Thesis Clip Geoloc Continent

A CLIP-ViT model optimized for image geolocation recognition, fine-tuned for continent-level queries.

Transformers English

GIT is a dual-conditional Transformer decoder based on CLIP image tokens and text tokens, designed for image-to-text generation tasks.

Transformers Supports Multiple Languages

Taiyi CLIP RoBERTa 326M ViT H Chinese

The first open-source Chinese CLIP model, pre-trained on 123 million image-text pairs, with RoBERTa-large architecture as the text encoder.

Transformers Chinese

Japanese Cloob Vit B 16

Japanese CLOOB (Contrastive Leave-One-Out Boost) model trained by rinna Co., Ltd. for cross-modal understanding of images and text

Transformers Japanese

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase